Gabor Features Based Script Identification of Lines within a Bilingual/Trilingual Document

نویسندگان

  • Rajneesh Rani
  • Renu Dhir
  • Gurpreet Singh Lehal
چکیده

The OCR technology for Indian documents is in emerging stage and most of these Indian OCR systems can read the documents written in only a single script. As many commercial and official documents of different states of India are tri-lingual in nature, therefore identification of script and/ or language is one of the elementary tasks for multi-script document recognition. A script recognizer simplifies the task of multi-lingual OCR by improving the accuracy and reducing the computational complexity. This script recognition may be at line, word or character level depending on interlacing of different scripts at different levels. This paper presents the effectiveness of Gabor Filter banks with kNN, SVM and PNN classifiers to identify the scripts at line level from such trilingual documents. The experiments show that Gabor features with SVM classifier achieve a recognition rate of 99.85% for trilingual documents.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Handwritten Script Recognition Using DCT, Gabor Filter and Wavelet Features at Line Level

In a country like India where more number of scripts are in use, automatic identification of printed and handwritten script facilitates many important applications including sorting of document images and searching online archives of document images. In this paper, a multiple feature based approach is presented to identify the script type of the collection of handwritten documents. Eight popula...

متن کامل

Word level script identification for scanned document images

In this paper, we compare the performance of three classifiers used to identify the script of words in scanned document images. In both training and testing, a Gabor filter is applied and 16 channels of features are extracted. Three classifiers (Support Vector Machines (SVM), Gaussian Mixture Model (GMM) and k -Nearest-Neighbor (k -NN)) are used to identify different scripts at the word level (...

متن کامل

Handwritten Script Identification: Fusion based Approaches

Script identification is one of the preprocessing steps in any document image processing task. Script identification in printed documents has achieved a greater attention whereas script identification in handwritten documents has achieved less attention from document research community. Almost all the existing works have made attempts on identifying suitable features or classifiers for handwrit...

متن کامل

Script Identification from Trilingual Documents using Profile Based Features

In a multi script environment, majority of the documents may contain text information printed in more than one script/language. For automatic processing of such documents through Optical Character Recognition (OCR), it is necessary to identify different script regions of the document. In this paper, it is proposed to develop a model to identify the script type of a trilingual document printed i...

متن کامل

Identification of Telugu, Devanagari and English Scripts Using Discriminating Features

In a multi-script multi-lingual environment, a document may contain text lines in more than one script/language forms. It is necessary to identify different script regions of the document in order to feed the document to the OCRs of individual language. With this context, this paper proposes to develop a model to identify and separate text lines of Telugu, Devanagari and English scripts from a ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014